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[57] ABSTRACT 

A real-time vector adaptive predictive coder which 
approximates each vector of K speech samples by using 
each of M fixed vectors in a first codebook to excite a 
time- varying synthesis filter and picking the vector that 
minimizes distortion. Predictive analysis for each frame 
determines parameters used for computing from vectors 
in the first codebook zero-state response vectors that 
are stored at the same address (index) in a second code- 
book. Encoding of input speech vectors s n is then car- 
ried out using the second codebook. When the vector 
that minimizes distortion is found, its index is transmit- 
ted to a decoder which has a codebook identical to the 
first codebook of the decoder. There the index is used to 
read out a vector that is used to synthesize an output 
speech vector The parameters used in the encoder 
are quantized, for example by using a table, and the 
indices are transmitted to the decoder where they are 
decoded to specify transfer characteristics of filters used 
in producing the vector T* from the receiver codebook 
vector selected by the vector index transmitted. 


12 Claims, 5 Drawing Sheets 
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VECTOR ADAPTIVE PREDICTIVE CODER FOR 
SPEECH AND AUDIO 

ORIGIN OF INVENTION 5 

The invention described herein was made in the per- 
formance of work under a NASA contract, and is sub- 
ject to the provisions of Public Law 96-517 (35 USC 
202) under which the inventors were granted a request 
to retain title. 10 

BACKGROUND OF THE INVENTION 

This invention relates a real-time coder for compres- 
sion of digitally encoded speech or audio signals for 
transmission or storage, and more particularly to a real- 15 
time vector adaptive predictive coding system. 

In the past few years, most research in speech coding 
has focused on bit rates from 16 kb/s down to 150 bits/s. 

At the high end of this range, it is generally accepted 
that toll quality can be achieved at 16 kb/s by sophisti- 20 
cated waveform coders which are based on scalar quan- 
tization. N. S. Jayant and P. Noll, Digital Coding of 
Waveforms , Prentice-Hall Inc., Englewood Cliffs, N.J., 
1984. At the other end, coders (such as linear-predictive 
coders) operating at 2400 bits/s or below only give 25 
syntheticquality speech. For bit rates between these 
two extremes, particularly between 4.8 kb/s and 9.6 
kb/s, neither type of coder can achieve high-quality 
speech. Part of the reason is that scalar quantization 
tends to break down at a bit rate of 1 bit/sample. Vector 30 
quantization (VQ), through its theoretical optimality 
and its capability of operating at a fraction of one bit per 
sample, offers the potential of achieving high-quality 
speech at 9.6 kb/s or even at 4.8 kb/s. J. Makhoul, S. 
Roucos, and H. Gish, “Vector Quantization in Speech 35 
Coding,” Proc. IEEE, Vol. 73, No. 11, November 1985. 

Vector quantization (VQ) can achieve a performance 
arbitrarily close to the ultimate rate-distortion bound if 
the vector dimension is large enough. T. Berger, Rate 
Distortion Theory , Prentice-Hall Inc., Englewood Cliffs, 40 
N.J., 1971. However, only small vector dimensions can 
be used in practical systems due to complexity consider- 
ations, and unfortunately, direct waveform VQ using 
small dimensions does not give adequate performance. 
One possible way to improve the performance is to 45 
combine VQ with other data compression techniques 
which have been used successfully in scalar coding 
schemes. 

In speech coding below 16 kb/s, one of the most 
successful scalar coding schemes is Adaptive Predictive 50 
Coding (APC) developed by Atal and Schroeder [B. S. 
Atal and M. R. Schroeder, “Adaptive Predictive Cod- 
ing of Speech Signals,” Bell Syst. Tech. J., Vol. 49, pp. 
1973-1986, October 1970; B. S. Atal and M. R. Schroe- 
der, “Predictive Coding of Speech Signals and Subjec- 55 
tive Error Criteria,” IEEE Trans. Acoust., Speech, 
Signal Proc., Vol. ASSP-27, No. 3, June 1979: and B. S. 
Atal, “Predictive Coding of Speech at Low Bit Rates,” 
IEEE Trans. Comm., Vol. COM-30, No. 4, April 1982]. 

It is the combined power of VQ and APC that led to the 60 
development of the present invention, a Vector Adapt- 
ive Predictive Coder (VAPC). Such a combination of 
VQ and APC will provide high-quality speech at bit 
rates between 4.8 and 9.6 kb/s, thus bridging the gap 
between scalar coders and VQ coders. 65 

The basic idea of APC is to first remove the redun- 
dancy in speech waveforms using adaptive linear pre- 
dictors, and then quantize the prediction residual using 
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a scalar quantizer. In VAPC, the scalar quantizer in 
APC is replaced by a vector quantizer VQ. The motiva- 
tion for using VQ is two-fold. First, although liner de- 
pendency between adjacent speech samples is essen- 
tially removed by linear prediction, adjacent prediction 
residual samples may still have nonlinear dependency 
which can be exploited by VQ. Secondly, VQ can oper- 
ate at rates below one bit per sample. This is not achiev- 
able by scalar quantization, but it is essential for speech 
coding at low bit rates. 

The vector adaptive predictive coder (VAPC) has 
evolved from APC and the vector predictive coder 
introduced by V. Cuperman and A. Gersho, “Vector 
Predictive Coding of Speech at 16 kb/s,” IEEE Trans. 
Comm., Vol. COM-33, pp. 685-696, July 1985. VAPC 
contains some features that are somewhat similar to the 
Code-Excited Linear Prediction (CELP) coder by M. 
R. Schroeder, B. S. Atal, “Code-Excited Linear Predic- 
tion (CELP): High-Quality Speech at Very Low Bit 
Rates,” Proc. Inti. Conf. Acoustics, Speech, Signal 
Proc., Tampa, March 1985, but with much less compu- 
tational complexity. 

In computer simulations, VAPC gives very good 
speech quality at 9.6 kb/s, achieving 18 dB of signal-to- 
noise ratio (SNR) and 16 dB of segmental SNR. At 4.8 
kb/s, VAPC also achieves reasonably good speech 
quality, and the SNR and segmental SNR are about 13 
dB and 11.5 dB, respectively. The computations re- 
quired to achieve these results are only in the order of 2 
to 4 million flops per second (one flop, a floating point 
operation, is defined as one multiplication, one addition, 
plus the associated indexing), well within the capability 
of today’s advanced digital signaling processor chips. 
VAPC may become a low-complexity alternative to 
CELP, which is known to have achieved excellent 
speech quality at an expected bit rate around 4.8 kb/s 
but is not presently capable of being implemented in 
real-time due to its astronomical complexity. It requires 
over 400 million flops per second to implement the 
coder. In terms of the CPU time of a supercomputer 
CRAY-1, CELP requires 125 seconds of CPU time to 
encode one second of speech. There is currently a great 
need for a real-time, high-quality speech coder operat- 
ing at encoding rates ranging from 4.8 to 9.6 kb/s. In 
this range of encoding rates, the two coders mentioned 
above (APC and CELP) are either unable to achieve 
high quality or too complex to implement. In contrast, 
the present invention, which combines Vector Quanti- 
zation (VQ) with the advantages of both APC and 
CELP, is able to achieve high-quality speech with suffi- 
ciently low complexity for real-time coding. 

OBJECTS AND SUMMARY OF THE 
INVENTION 

An object of this invention is to encode in real time 
analog speech or audio waveforms into a compressed 
bit stream for storage and/or transmission, and subse- 
quent reconstruction of the waveform for reproduction. 

Another object is to provide adaptive post-flltering of 
a speech or audio signal that has been corrupted by 
noise resulting from a coding system or other sources of 
degradation so as to enhance the perceived quality of 
said speech or audio signal. 

The objects of this invention are achieved by a system 
which approximates each vector of K speech samples 
by using each of M fixed vectors stored in a VQ code- 
book to excite a time-varying synthesis filter and pick- 
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ing the best synthesized vector that minimizes a percep- 
tually meaningful distortion measure. The original sam- 
pled speech is first buffered and partitioned into vectors 
and frames of vectors, where each frame is partitioned 
into N vectors, each vector having K speech samples. 5 
Predictive analysis of pitch -filtering parameters (P) 
linear-predictive coefficient filtering parameters (LPC), 
perceptual weighting filter parameters (W) and residual 
gain scaling factor (G) for each of successive frames of 
speech is then performed. The parameters determined 10 
in the analyses are quantized and reset every frame for 
processing each input vector s rt in the frame, except the 
perceptual weighting parameter. A perceptual 
weighting filter responsive to the parameters W is used 
to help select the VQ vector that minimizes the percep- 15 
tual distortion between the coded speech and the origi- 
nal speech. Although not quantized, the perceptual 
weighting filter parameters are also reset every frame. 

After each frame is buffered and the above analysis is 
completed at the beginning of each frame, M zero-state 20 
response vectors are computed and stored in a zero- 
state response codebook. These M zero-state response 
vectors are obtained by first setting to zero the memory 
of an LPC synthesis filter and a perceptual weighting 
filter in cascade with a scaling unit controlled by the 25 
factor G, and then controlling the respective filters with 
the quantized LPC filter parameters and the unquan- 
tized perceptual weighting filter parameters, and excit- 
ing the cascaded filters using one predetermined and 
fixed vector quantization (VQ) codebook vector at a 30 
time. The output vector of the cascaded filters for each 
VQ codebook vector is then store in a temporary zero- 
state codebook at the corresponding address, i.e., is 
assigned the same index of a temporary zero-state re- 
sponse codebook as the index of the exciting vector out 35 
of the VQ codebook. In encoding each in each vector 
s„ within a frame, a pitch-predicted vector s n the vector 
s n is determined by processing the last vector encoded 
as an index code through a scaling unit, LPC synthesis 
filter and pitch predictor filter controlled by the param- 40 
eters QG, QLPC, QP and QPP for the frame. In addi- 
tion, the zero-input response of the cascaded filters (the 
ringing from excitation of a previous vector) is first set 
in a zero-input response filter. Once the pitch-predicted 
vector in is subtracted from the input signal vector s«, 45 
and a difference vector d« is passed through the percep- 
tual weighting filter to produce a filtered difference 
vector frt, the zero-input response vector in the afore- 
said zero-input response filter is subtracted from the 
output of the perceptual weight filter, namely the differ- 50 
ence vector f„, and the resulting vector w n is compared 
with each of the M stored zero-state response vectors in 
search of the one having a minimum difference A or 
distortion. 

The index (address) of the zero-state response vector 55 
that produces the smallest distortion, i.e., that is closest 
to Vrt, identifies the best vector in the permanent VQ 
codebook. Its index (address) is transmitted as the vec- 
tor compressed code for the vector s„, and used by a 
receiver which has an identical VQ codebook as the 60 
transmitter to find the best-match vector. In the trans- 
mitter, that best-match vector is used at the time of 
transmission of its index to excite the LPC synthesis 
filter and pitch prediction filter to generate an estimate 
^ of the next speech vector. The best-match vector is 65 
also used to excite the zero-input response filter to set it 
for the next input vector Srt to be processed as described 
above. The indices of the best-match vectors for a frame 
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of vectors are combined in a multiplexer with the frame 
analysis information hereinafter referred to as “side 
information,” comprised of the indices of quantized 
parameters which control pitch, pitch predictor and 
LPC predictor filtering and the gain used in the coding 
process, in order that it be used by the receiver in de- 
coding the vector indices of a frame into vectors using 
a codebook identical to the permanent VQ codebook at 
the transmitter. This side information is preferably 
transmitted through the multiplexer first, once for each 
frame of VQ indices that follow, but it would be possi- 
ble to first transmit a frame of vector indices, and then 
transmit the side information since the frames of vector 
indices will require some buffering in either case; the 
difference is only in some initial delay at the beginning 
of speech or audio frames transmitted in succession. 
The resulting stream of multiplexed indices are trans- 
mitted over a communication channel to a decoder, or 
stored for later decoding. 

In the decoder, the bit stream is first demultiplexed to 
separate the side information from the encoded vector 
indices that follow. Each encoded vector index is used 
at the receiver to extract the corresponding vector from 
the duplicate VQ codebook. The extracted vector is 
first scaled by the gain parameter, using a table to con- 
vert the quantized gain index to the appropriate scaling 
factor, and then used to excite cascaded LPC synthesis 
and pitch synthesis filters controlled by the same side 
information used in selecting the best-match index uti- 
lizing the zero-state response codebook in the transmit- 
ter. The output of the pitch synthesis filter is the coded 
speech, which is perceptually close to the original 
speech. All of the side information, except the gain 
information, is used in an adaptive postfilter to enhance 
the quality of the speech synthesized. This postfiltering 
technique may be used to enhance any voice or audio 
signal. All that would be required is an analysis section 
to produce the parameters used to make the postfilter 
adaptive. 

Other modifications and variation to this invention 
may occur to those skilled in the art, such as variable- 
frame-rate coding, fast codebook searching, reversal of 
the order of pitch prediction and LPC prediction, and 
use of alternative perceptual weighting techniques. 
Consequently, the claims which define the present in- 
vention are intended to encompass such modifications 
and variations. 

Although the purpose of this invention is to encode 
for transmission and/or storage of analog speech or 
audio waveforms for subsequent reconstruction of the 
waveforms upon reproduction of the speech or audio 
program, reference is made hereinafter only to speech, 
but the invention described and claimed is applicable to 
audio waveforms or to sub-band filtered speech or 
audio waveforms. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. la is a block diagram of a Vector Adaptive 
Predictive Coding (VAPC) processor embodying the 
present invention, and 

FIG, lb is a block diagram of a receiver for the en- 
coded speech transmitted by the system of FIG. leu 

FIG. 2 is a schematic diagram that illustrates the 
adaptive computation of vectors for a zero -state re- 
sponse codebook in the system of FIG. la, 

FIG. 3 is a block diagram of an analysis processor in 
the system of FIG. la. 
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FIG. 4 is a block diagram of an adaptive post filter of 
FIG: lb. 

FIG. 5 illustrates the LPC spectrum and the corre- 
sponding frequency response of an all-pole post-filter 
l/[£ — P(z/ a)] for different values of a. The offset 
between adjacent plots is 20 dB. 

FIG. 6 illustrates the frequency responses of the post- 
filter [1 - jllz- 1][1 - P(z/j8)]/[l - P(z/ a)] correspond- 
ing to the LPC spectrum shown in FIG. 5. In both plots, 
a=0.8 and (3—0.5. The offset between the two plots is 
20 dB. 

DESCRIPTION OF PREFERRED 
EMBODIMENTS 

The preferred mode of implementation contemplates 
using programmable digital signal processing chips, 
such as one or two AT&T DSP32 chips, and auxiliary 
chips for the necessary memory and controllers for such 
equipments as input sampling, buffering and multiplex- 
ing. Since the system is digital, it is synchronized 
throughout with the samples. For simplicity of illustra- 
tion and explanation, the synchronizing logic is not 
shown in the drawings. Also for simplification, at each 
point where a signal vector is subtracted from another, 
the subtraction function is symbolically indicated by an 
adder represented by a plus sign within a circle. The 
vector being subtracted is on the input labeled with a 
minus sign. In practice, the two’s complement of the 
subtrahend is formed and added to the minuend. How- 
ever, although the preferred implementation contem- 
plates programmable digital signal processors, it would 
be possible to design and fabricate special integrated 
circuits using VLSI techniques to implement the pres- 
ent invention as a special purpose, dedicated digital 
signal processor once the quantities needed would jus- 
tify the initial cost of design. 

Referring to FIG. la, original speech samples in digi- 
tal form from sampling analog-to-digital converter 10 
are received by an analysis processor 11 which parti- 
tions them into vectors s n of K samples per vector, and 
into frames of N vectors per frame. The analysis proces- 
sor stores the samples in a dual buffer memory which 
has the capacity for storing more than one frame of 
vectors, for example two frames of 8 vectors per frame, 
each vector consisting of 20 samples, so that the analysis 
processor may compute parameters used for coding the 
stored frame. As each frame is being processed out of 
one buffer, a new frame coming in is stored in the other 
buffer so that when processing of a frame has been 
completed, there is a new frame buffered and ready to 
be processed. 

The analysis processor 11 determines the parameters 
of filters employed in the Vector Adaptive Predictive 
Code (VAPC) technique that is the subject of this in- 
vention. These parameters are transmitted through a 
multiplexer 12 as side information just ahead of the 
frame of vector codes generated with the use of a per- 
manent vector quantized (VQ) codebook 13 and a zero- 
state response (ZSR) codebook 14. The side informa- 
tion conditions the receiver to properly filter decoded 
vectors of the frame. The analysis processor 11 also 
computes other parameters used in the encoding pro- 
cess. The latter are represented in FIG. la by labeled 
lines, and consist of sets of parameters which are desig- 
nated W for a perceptual weighting filter 18, a quan- 
tized LPC predictor QLPC for an LPC synthesis filter 
15, and quantized pitch QP and pitch predictor QPP for 
a pitch synthesis filter 16. Also computed by the analy- 
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sis processor is a scaling factor G that is quantized to 
AG for control of a scaling unit 17. The four quantized 
parameters transmitted as side information are encoded 
for transmission using a quantizing table as the quan- 
5 tized pitch index, pitch predictor index, LPC predictor 
index and gain index. The manner in which the analysis 
processor computes all of these parameters will be de- 
scribed with reference to FIG. 3. 

The multiplexer 12 preferably transmits the side in- 
to formation as soon as it is available, although it could 
follow the frame of encoded input vectors, and while 
that is being done, M zero-state response vectors are 
computed for the zero-state response (ZSR) codeboOk 
14 in a manner illustrated in FIG. 2, which is to process 
15 each vector in the VQ codebook, 13 e.g., 128 vectors, 
through a gain scaling unit 17', an LPC synthesis filter 
15', and perceptual weighting filters 18' corresponding 
to the gain scaling unit 17, the LPC synthesis filter 15, 
and perceptual weighting filter 18 in the transmitter 
20 (FIG. la). Ganged commutating switches Si and S 2 are 
shown to signify that each fixed VQ vector processed is 
stored in memory locations of the same index (address) 
in the ZSR codebook. 

At the beginning of each codebook vector process- 
25 ing, the initial conditions of the cascaded filters 15' and 
18' are set to zero. This simulates what the cascaded 
filters 15' and 18' will do with no previous vector pres- 
ent from its corresponding VQ codebook. Thus, if the 
output of a zero-input response filter 19 in the transmit- 
30 ter (FIG. la) is held or stored so at each step of comput- 
ing the VQ code index (to transmit for each vector of a 
frame), it is possible to simplify encoding the speech 
vectors by subtracting the zero-state response output 
from the vector f». In other words, assuming M = 128, 
35 there are 128 different vectors permanently stored in 
the VQ codebook to use in coding the original speech 
vectors s n . Then every one of the 128 VQ vectors is 
read out in sequence, fed through the scaling unit 17', 
the LPC synthesis filter 15', and the perceptual 
40 weighting filter 18' shown in FIG. 2 without any his- 
tory of previous vector inputs (ie., without any ringing 
due to excitation by a preceding vector) by resetting 
those filters at each step. The resulting filter output 
vector is then stored in a corresponding location in the 
45 zero-state response codebook 14. Later, while encoding 
input signal vectors s„ by finding the best match be- 
tween a vector v„ and all of the zero state response 
vector codes, it is necessary to subtract from a vector f* 
derived from the perceptual weighting filter a value 
50 that corresponds to the effect of the previously selected 
VQ vector. That is done through the zero-input re- 
sponse filter 19. The index (address) of the best match is 
used as the compressed vector code transmitted for the 
vector s„. Of the 128 zero-state response vectors, there 
55 will be only one that provides the best match, i.e., least 
distortion. Assume it is in location 38 of the zero -state 
response codebook as determined by a computer 20 
labeled “compute norm.” An address register 20a will 
store the index 38. It is that index that is then transmit- 
60 ted as a VQ index to the receiver shown in FIG. lb. 

In the receiver, a demultiplexer 21 separates the side 
information which conditions the receiver with the 
same parameters as corresponding filters and scaling 
unit of the transmitter. The receiver uses a decoder 22 
65 to translate the parameter indices to parameter values. 
The VQ index for each successive vector in the frame 
addresses a VQ codebook 23 which is identical to the 
fixed VQ codebook 13 of the transmitter. The LPC 
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synthesis filter 24, pitch synthesis filter 25, and scaling 
unit 26 are conditioned by the same parameters which 
were used in computing the zero-state codebook values, 
and which were in turn used in the process of selecting 
the encoding index for each input vector. At each step 5 
of finding and transmitting an encoding index, the zero- 
input response filter 19 computes from the VQ vector at 
the location of the index transmitted a value to be sub- 
tracted from the input vector f„ to present a zero-input 
response to be used in the best-match search. 

There are various procedures that may be used to 
determine the best match for an input vector s n . The 
simplest is to store the resulting distortion between each 
zero-state response vectorcode output and the vector 
v n with the index of that zero-state response vector 15 
code. Assuming there are 128 vectorcodes stored in the 
codebook 14, there would then be 128 resulting distor- 
tions stored in a computer 20. Then, after all have been 
stored, a search is made in the computer 20 for the 
lowest distortion value). Its index (address) of that low- 20 
est distortion value is then stored in a register 20a and 
transmitted to the receiver as an encoded vector via the 
multiplexer 12, and to the VQ codebook for reading the 
corresponding VQ vector to be used in the processing 
of the next input vector s n . 

In summary, it should be noted that the VQ codebook 
is used (accessed) in two different steps: first, to com- 
pute vector codes for the zero-state response codebook 
at the beginning of each frame, using the LPC synthesis 
and perceptual weighting filter parameters determined 30 
for the frame: and second, to excite the filters 15 and 16 
through the scaling unit 17 while searching. for the 
index of the bestmatch vector, during which the esti- 
mate s n thus produced is subtracted from the input vec- 
tor s n . The difference d n is used in the best-match 35 
search. 

As the best match for each input vector s n is found, 
the corresponding predetermined and fixed vector from 
the VQ codebook is used to reset the zero input re- 
sponse filter 19 for the next vector of the frame. The 40 
function of the zero-input response filter 19 is thus to 
find the residual response of the gain scaling unit 17' and 
filters 15' and 18' to previously selected vectors from 
the VQ codebook. Thus, the selected vector is not 
transmitted: only is used to read out the selected vector 45 
from a VQ codebook 23 identical to the VQ codebook 
13 in the transmitter. 

The zero-input response filter 19 is the same filtering 
operation that is used to generate the ZSR codebook 14, 
namely the combination of a gain G, an LPC synthesis 50 
filter and a weighting filter, as shown in FIG. 2. Once a 
best codebook vector match is determined, the best- 
match vector is applied as an input to this filter (sample 
by sample, sequentially). An input switch s is closed 
and an output switch s out is open during this time so that 55 
the first K output samples are ignored (K is the dimen- 
sion of the vector and a typical value of K is 20.) As 
soon as all K samples have been applied as inputs to the 
filter, the filter input switch s,„ is opened and the output 
switch s ou r is closed. The next K samples of the vector 60 
f„, the output of the perceptual weighting filter, begin to 
arrive and are subtracted from the samples of the vector 
f*. The difference so generated is a set of K samples 
forming the vector v n which is stored in a static register 
for use in the ZSR codebook search procedure. In the 65 
ZSR codebook search procedure, the vector v n is sub- 
tracted from each vector stored in the ZSR codebook, 
and the difference vector A is fed to the computer 20 
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together with the index (or stored in the same order, 
thereby to imply the index of the vector out of the ZSR 
codebook). The computer 20 then determines which 
difference is the smallest, i.e., which is the best match 
between the vector v n and each vector stored temporar- 
ily (for one frame of input vectors s«). The index of that 
best-match vector is stored in a register 20a. That index 
is transmitted as a vectorcode and used to address the 
VQ codebook to read the vector stored there into the 
10 scaling unit 17, as noted above. This search process is 
repeated for each vector in the ZSR codebook, each 
time using the same vector v„. Then the best vector is 
determined. 

Referring now to FIG. lb, it should be noted that the 
output of the VQ codebook 23, which precisely dupli- 
cates the VQ codebook 13 of the transmitter, is identical 
to the vector extracted from the best-match index ap- 
plied as an address to the VQ codebook 13: the gain unit 
26 is identical to the gain unit 17 in the transmitter, and 
filters 24 and 25 exactly duplicate the filters 15 and 16, 
respectively, except that at the receiver, the approxima- 
tion s n rather than the prediction s n is taken as the output 
of the pitch synthesis filter 25. The result, after convert- 
ing from digital to analog form, is synthesized speech 
25 that reproduces the original speech with very good 
quality. 

It has been found that by applying an adaptive postfil- 
ter 30 to the synthesized speech before converting it 
from digital to analog form, the perceived coding noise 
may be greatly reduced without introducing significant 
distortion in the filtered speech. FIG. 4 illustrates the 
organization of the adaptive postfilter as a long-delay 
filter 31 and a short-delay filter 32. Both filters are 
adaptive in that the parameters used in them are those 
received as side information from the transmitter, ex- 
cept for the gain parameter, G. The basic idea of adapt- 
ive post-filtering is to attenuate the frequency compo- 
nents of the coded speech in spectral valley regions. At 
low bit rates, a considerable amount of perceived cod- 
ing noise comes from spectral valley regions where 
there are no strong resonances to mask the noise. The 
postfilter attenuates the noise components in spectral 
valley regions to make the coding noise less perceiv- 
able. However, such filtering operation inevitably intro- 
duces some distortion to the shape of the speech spec- 
trum. Fortunately, our ears are not very sensitive to 
distortion in spectral valley regions: therefore, adaptive 
postfiltering only introduces very slight distortion in 
perceived speech, but it significantly reduces the per- 
ceived noise level. The adaptive postfilter will be de- 
scribed in greater detail after first describing in more 
detail the analysis of a frame of vectors to determine the 
side information. 

Referring now to FIG. 3, it shows the organization of 
the initial analysis of block 11 in FIG., la. The input 
speech samples s„ are first stored in a buffer 40 capable 
of storing, for example, more than one frame of 8 vec- 
tors, each vector having 20 samples. 

Once a frame of input vectors s* has been stored, the 
parameters to be used, and their indices to be transmit- 
ted as side information, are determined from that frame 
and at least a part of the previous frame in order to 
perform analysis with information from more than the 
frame of interest The analysis is carried out as shown 
using a pitch detector 41, pitch quantizer 42 and a pitch 
predictor coefficient quantizer 43. What is referred to as 
“pitch” applies to any observed periodicity in the input 
signal, which may not necessarily correspond to the 
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classical use of “pitch” corresponding to vibrations in 
the human vocal folds. The direct output of the speech 
is also used in the pitch predictor coefficient quantizer 
43. The quantized pitch (QP) and quantized pitch pre- 
dictor (QPP) are used to compute a pitch, prediction 5 
residual in block 44, and as control parameters for the 
pitch synthesis filter 16 used as a predictor in FIG. la. 
Only a pitch index and a pitch prediction index are 
included in the side information to minimize the number 
of bits transmitted. At the receiver, the decoder 22 will 10 
use each index to produce the corresponding control 
parameters for the pitch synthesis filter 25. 

The pitch-prediction residual is stored in a buffer 45 
for LPC analysis in block 46. The LPC predictor from 
the LPC analysis is quantized in block 47. The index of 15 
the quantized LPC predictor is transmitted as a third 
one of four pieces of side information, while the quan- 
tized LPC predictor is used as a parameter for control 
of the LPC synthesis filter 15, and in block 48 to com- 
pute the rms value of the LPC predictive residual. This 20 
value (unquantized residual gain) is then quantized in 
block 49 to provide gain control G in the scaling unit 17 
of FIG. la. The index of the quantized residual gain is 
the fourth part of the side information transmitted. 

In addition to the foregoing, the analysis section pro- 25 
vides LPC analysis in block 50 to produce an LPC 
predictor from which the set of parameters W for the 
perceptual weighting filter 18 (FIG. la) is computed in 
block 51. 

The adaptive postfilter 30 in FIG. lb will now be 30 
described with reference to FIG. 4. It consists of a 
long-delay filter 31 and a short-delay filter 32 in cas- 
cade. The long-delay filter is derived from the decoded 
pitch-predictor information available at the receiver. It 
attenuates frequency components between pitch har- 35 
monic frequencies. The short-delay filter is derived 
from LPC predictor information, and it attenuates the 
frequency components between formant frequencies. 

The noise masking effect of human auditory percep- 
tion, recognized by M. R. Schroeder, B. S. Atal, and J. 40 
L. Hall, “Optimizing Digital Speech Coders by Exploit- 
ing Masking Properties of the Human Ear,” J. Acoust. 
Soc. Am., Vol. 66, No. 6, pp. 1647-1652, December 
1979, is exploited in VAPC by using noise spectral 
shaping. However, in noise spectral shaping, lowering 45 
noise components at certain frequencies can only be 
achieved at the price of increased noise components at 
other frequencies. [B. S. Atal and M. R. Schroeder, 
“Predictive Coding of Speech Signals and Subjective 
Error Criteria,” IEEE Trans. Acoust., Speech, and 50 
Signal Processing, Vol. ASSP-27, No. 3, pp. 247-254, 
June 1979]Therefore, at bit rates as low as 4800 bps, 
where the average noise level is quite high, it is very 
difficult, if not impossible, to force noise below the 
masking threshold at all frequencies. Since speech for- 55 
mants are much more important to perception than 
spectral valleys, the approach of the present invention is 
to preserve the formant information by keeping the 
noise in the formant regions as low as is practical during 
encoding. Of course, in this case, the noise components 60 
in spectral valleys may exceed the threshold; however, 
these noise components can be attenuated later by the 
postfilter 30. In performing such postfiltering, the 
speech components in spectral valleys will also be atten- 
uated. Fortunately, the limen, or “just noticeable differ- 65 
ence,” for the intensity of spectral valleys can be quite 
large [J. L. Flanagan, Speech Analysis, Synthesis, and 
Perception, Academic Press, New York, 1972]. There- 


10 

fore, by attenuating the components in spectral valleys, 
the postfilter only introduces minimal distortion in the 
speech signal, but it achieves a substantial noise reduc- 
tion. 

Adaptive postfiltering has been used successfully in 
enhancing ADPCM-coded speech. See V. Rama- 
moorthy and J. S. Jayant, “Enhancement of ADPCM 
Speech by Adaptive Postfiltering,” AT&T Bell Labs 
Tech. J., pp. 1465-1475, October 1984: and N. S. Jayant 
and V. Ramamoorthy, “Adaptive Postfiltering of 16 
kb/s-ADPCM Speech,” Proc. ICASSP, pp. 829-832, 
Tokyo, Japan, April 1986. The postfilter used by Rama- 
moorthy, et al., supra, is derived from the two-pole 
six-zero ADPCM synthesis filter by moving the poles 
and zeros radially toward the origin. If this idea is ex- 
tended^ directly to an all-pole LPC synthesis filter 
1/[1 — P(z)], the result is I/[l — P(z/a)] as the corre- 
sponding postfilter, where 0<a<l. Such an all-pole 
postfilter indeed reduces the perceived noise level: 
however, sufficient noise reduction can only be 
achieved with severe muffling in the filtered speech. 
This is due to the fact that the frequency response of this 
all-pole postfilter generally has a lowpass spectral tilt 
for voiced speech. 

The^ spectral tilt of the all-pole postfilter 
1/[1 — P(z/a)] can be easily reduced by adding zeros 
having the same phase angles as the poles but with 
smaller radii. The transfer function of the resulting 
pole-zero postfilter 32a has the form 

m = f Q < fi < a < 1 

1 - P(z/a) 

where a and £ are coefficients empirically determined, 
with some tradeoff between spectral peaks being so 
sharp as to produce chirping and being so low as to not 
achieve any noise reduction. The frequency response of 
H(z) can be expressed as 

20 log | | — 20 log - (2> 

20 108 1 1 - l^Fm\ 

Therefore, in logarithmic scale, the frequency response 
of the pole-zero postfilter H(z) is simply the difference 
between the frequency responses of two all-pole postfil- 
ters. 

Typical values of a and £ are 0.8 and 0.5, respec- 
tively. From FIG. 5, it is seen that the response for 
a =0.8 has both formant peaks and spectral tilt, while 
the response for a=0.5 has spectral tilt only. Thus, with 
a =0.8 and /3=0.5 in Equation 2, we can at least par- 
tially remove the spectral tilt by subtracting the re- 
sponse for a =0.5 from the response for a =0.8. The 
resulting frequency response of H(z) is shown in the 
upper plot of FIG. 6 

In informal listening tests, it has been found that the 
muffling effect was significantly reduced after the nu- 
merator term [1— P(z/£)] was included in the transfer 
function H(z) However, the filtered speech remained 
slightly muffled even with the spectral-tilt compensat- 
ing term [1— P(z/£)]. To further reduce the muffling 
effect, a first-order filter 32 b was added which has a 
transfer function of [1 — jtz^ 1 ], where \x is typically 0.5. 
Such a filter provides a slightly highpassed spectral tilt 
and thus helps to reduce muffling This first-order filter 
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is used in cascade with H(z), and a combined frequency 
response with /x=0.5 is shown in the lower plot of FIG. 

6 . 

The short-delay postfilter 32 just described basically 
amplifies speech formants and attenuates inter-formant 5 
valleys To obtain the ideal postfilter frequency re- 
sponse, we also have to amplify the pitch harmonics and 
attenuate the valleys between harmonics. Such a char- 
acteristic of frequency response can be achieved with a 
long-delay postfilter using the information in the pitch 
predictor. 

In VAPC, we use a three-tap pitch predictor: the 
pitch synthesis filter corresponding to such a pitch pre- 
dictor is not guaranteed to be stable. Since the poles of 15 
such a synthesis filter may be outside the unit circle, 
moving the poles toward the origin may not have the 
same effect as in a stable LPC synthesis filter. Even if 
the three-tap pitch synthesis filter is stabilized, its fre- 
quency response may have an undesirable spectral tilt. 20 
Thus, it is not suitable to obtain the long-delay postfilter 
by scaling down the three tap weights of the pitch syn- 
thesis filter. 

With both poles and zeroes, the long-delay postfilter 
can be chosen as 25 


H\{z) = 


l + y z-p 
s l - \z~P 


(3) 


30 

where p is determined by pitch analysis, and C g is an 
adaptive scaling factor. 

Knowing the information provided by a single or 
three-tap pitch predictor as the value b 2 or the sum of 
bt =b 2 +b 3 , the factors Y and y are determined accord- 25 
ing to the following formulas: 


y - C-Ax), \ ~ Cpf{x), 0 < C Zi C p < 1 
where 

1 if x > 1 

Ax) = x\fUtk*x*\ 

0 if* < u th 

where 

where \5th is a threshold value (typically 0.6) deter- 
mined empirically, and x can be either b 2 or bi +b 2 +b 3 
depending on whether a one-tap or a three-tap pitch 
predictor is used. Since a quantized three-tap pitch 50 
predictor is preferred and therefore already available at 
the VAPC receiver, x is chosen as 


(4) 

40 

(5) 

45 


1 b h 55 

1 = 1 

in VAPC postfiltering. On the other hand, if the postfil- 
ter is used elsewhere to enhance noisy input speech, a 
separate pitch analysis is needed, and x may be chosen ^ 
as a single value b 2 since a one-tap pitch predictor suf- 
fices. (The value b 2 when used alone indicates a value 
from a single-tap predictor, which in practice would be 
the same as a three-tap predictor when bi and b 3 are set 
to zero.) 65 

The goal is to make the power of (y(n)} about the 
same as that of {s(n)}. An appropriate scaling factor is 
chosen as 


r _ 1 - \/x (6) 

g 1 + y/x 

The first-order filter 32 b can also be made adaptive to 
better track the change in the spectral tilt of H(z). How- 
ever, it has been found that even a fixed filter with 
jx=0.5 gives quite satisfactory results. A fixed value of 
\x, may be determined empirically. 

To avoid occasional large gain excursions, an auto- 
matic gain control (AGC) was added at the output of 
the adaptive postfilter. The purpose of AGC is to scale 
the enhanced speech such that it has roughly the same 
power as the unflltered noisy speech. It is comprised of 
a gain (square root of power) estimator 33 operating on 
the speech input s r , a gain (square root of power) estima- 
tor 34 operating on the postfiltered output r(n), and a 
circuit 35 to compute a scaling factor as the ratios of the 
two gains. The postfiltering output r(n) is then multi- 
plied by this ratio in a multiplier 36. AGC is thus 
achieved by estimating thee square root of the power of 
the unflltered and filtered speech separately and then 
using the ratio of the two values as the scaling factor. 
Let {s(n)} be the sequence of either unflltered or fil- 
tered speech samples: then, the speech power cr 2 (n) is 
estimated by using 

cr 2 (n)=?o* 2 (n-l)4-(l-0s 2 (n), 0<£<1. (7) 


A suitable value of £ is 0.99. 

The complexity of the postfilter described in this 
section is only a small fraction of the overall complexity 
of the rest of the VAPC system, or any other coding 
system that may be used. In simulations, this postfilter 
achieves significant noise reduction with almost negligi- 
ble distortion in speech. To test for possible distorting 
effects, the adaptive postfiltering operation was applied 
to clean, uncoded speech and it was found that the 
unflltered original and its filtered version sound essen- 
tially the same, indicating that the distortion introduced 
by this postfilter is negligible. 

It should be noted that although this novel postfilter- 
ing technique was developed for use with the present 
invention, its applications are not restricted to use with 
it. In fact, this technique can be used not only to en- 
hance the quality of any noisy digital speech signal but 
also to enhance the decoded speech of other speech 
coders when provided with a buffer and analysis section 
for determining the parameters. 

What has been disclosed is a real-time Vector Adapt- 
ive Predictive Coder (VAPC) for speech or audio 
which may be implemented with software using the 
commercially available AT&T DSP32 digital process- 
ing chip. In its newest version, this chip has a processing 
power of 6 million instructions per second (MIPS). To 
facilitate implementation for real-time speech coding, a 
simplified version of the 4800 bps VAPC is available. 
This simplified version has a much lower complexity, 
but gives nearly the same speech quality as a full com- 
plexity version. 

In the real-time implementation, an innerproduct 
approach is used for computing the norm (smallest dis- 
tortion) which is more efficient than the conventional 
difference-square approach of computing the mean 
square error (MSE) distortion. Given a test vector v 
and M ZSR codebook vectors, z j, j = 1,2, . . M, the j-th 
MSE distortion can be computed as 
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II ^ ^ I! 2 = II v II 2 — 2 [v r zy —i|| Zj || 2 ] (8) 

At the beginning of each frame, it is possible to compute 
and store \ || Zj || 2 . With the DSP32 processor and for 5 
the dimension and codebook size used, the difference- 
square approach of the codebook search requires about 
2.5 MIPS to implement, while the inner-product ap- 
proach only requires about 1.5 MIPS. 

The complexity of the VAPC is only about 3 million 10 
multiply-adds/second and 6 k words of data memory. 
However, due to the overhead in implementation, a 
single DSP32 chip was not sufficient for implementing 
the coder. Therefore, two DSP32 chips were used to 
implement the VAPC. With a faster DSP32 chip now 15 
available, which has an instruction cycle time of 160 ns 
rather than 250 ns, it is expected that the VAPC can be 
implemented using only one DSP32 chip. 

What is claimed is: 

1. An improvement in the method for compressing 20 
digitally encoded input speech or audio vectors at a 
transmitter by using a scaling unit controlled by a quan- 
tized residual gain factor QG, a synthesis filter con- 
trolled by a set of quantized linear protective coefficient 
parameters QLPC, a pitch predictor controlled by pitch 25 
and pitch predictor parameters QP and QPP, a 
weighting filter controlled by a set of perceptual 
weighting parameters W, and a permanent indexed 
codebook containing a predetermined number M of 
codebook vectors, each having an assigned codebook 30 
index, to find an index which identifies the best match 
between an input speech or audio vector s n that is to be 
coded and a synthesized vector s n generated from a 
stored vector in said indexed codebook, wherein each 
of said digitally encoded input vectors consists of a 35 
predetermined number K of digitally coded samples, 
comprising the steps of 

buffering and grouping said input speech or audio 
vectors into frames of vectors with a predeter- 
mined number N of vectors in each frame, 40 

performing an initial analysis for each successive 
frame, said analysis including the computation of a 
residual gain factor G, a set of perceptual 
weighting parameters W, a pitch parameter P, a 
pitch predictor parameter PP, and a set of said 45 
linear predictive coefficient parameters LPC, and 
the computation of quantized values QG, QP, QPP 
and QLPC of parameters G, P, PP and LPC using 
one or more indexed quantizing tables for the com- 
putation of each quantized parameter or set of 50 
parameters 

for each frame transmitting indices of said quantized 
parameters QG, QP, QPP and QLPC determined 
in the initial analysis step as side information about 
vectors analyzed for later use in looking up in one 55 
or more identical tables said quantized parameters 
QG, QP QPP and QLPC while reconstructing 
speech and audio vectors from encoded vectors in 
a frame, where each index for a quantized parame- 
ter points to a location in one or more of said identi- 60 
cal tables where said quantized parameter may be 
found, 

computing a zero-state response vector from the vec- 
tor output of a zero-input response filter compris- 
ing a scaling unit, synthesis filter and weighting 65 
filter identical in operation to said scaling unit, 
synthesis filter and weighting filter used for encod- 
ing said input vectors, said zero-state response vec- 
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tor being computed for each vector in said perma- 
nent codebook by first setting to zero the initial 
condition of said zero-input response filter so that 
the response computed is not influenced by a pre- 
ceding one of said codebook vectors processed by 
said zero-input response filter, and the using said 
quantized values of said residual gain factor, set of 
linear predictive coefficient parameters, and said 
set of perceptual weighting parameters computed 
in said initial analysis step by processing each vec- 
tor in said permanent codebook through said zero- 
input response filter to compute a zero-state re- 
sponse vector, and storing each zero-state response 
vector computed in a zero-state response codebook 
at or together with an index corresponding to the 
index of said vector in said permanent codebook 
used for this zero-state response computation step, 
and 

after thus performing an initial analysis of and com- 
puting a zero-state response codebook for each 
successive frame of input speech or audio vectors, 
encode each input vector s n of a frame in sequence 
by transmitting the codebook index of the vector in 
said permanent codebook which corresponds to 
the index of a zero-state response vector in said 
zero-state response codebook that best matches a 
vector w n obtained from an input vector s„ by 
subtracting a long term pitch prediction vector s« 
from the input vector s n to produce a difference 
vector d* and filtering said difference vector d n by 
said perceptual weighting filter to produce a final 
input vector f n , where said long term pitch predic- 
tion $„ is computed by taking a vector from said 
permanent codebook at the address specified by the 
preceding particular index transmitted as a com- 
pressed vector code and performing gain scaling of 
this vector using said quantized gain factor QG, 
then synthesis filtering the vector obtained from 
said scaling using said quantized values QLPC of 
said set of linear predictive coefficient parameters 
to obtain a vector S n and from vector d n producing 
a long term pitch predicted vector s n of the next 
input vector s n through a pitch synthesis filter using 
said quantized values of pitch predictor parameters 
QP and QPP, said long term prediction vector s„ 
being a prediction of the next input vector s m and 
producing said vector v„ by subtracting from said 
final input vector f n the vector output of said zero- 
input response filter generated in response to a 
permanent codebook vector at the codebook ad- 
dress of the last transmitted index code, said vector 
output being generated by processing through said 
zero input response filter, said permanent code- 
book vector located at said last transmitted index 
code where the output of said zero input response 
filter is discarded while said permanent codebook 
vector located at said last transmitted index code is 
being processed sample by sample in sequence into 
said zero input response filter until all samples of 
said codebook vector have been entered, and 
where the input of said zero input response filter is 
interrupted after all samples of said codebook vec- 
tor have been entered and then the desired vector 
output from said zero-input response filter is pro- 
cessed out sample by sample for subtraction from 
said final vector v n , and 
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for each input vector s n in a frame, finding the vector 
stored in said zero-state response codebook which 
best matches the vector v n , thereby finding the best 
match of a codebook vector with an input vector, 
using an estimate vector s„ produced from the best 5 
match codebook vector found for the preceding 
input vector, 

having found the best match of said vector v„ with a 
zero-state response vector in said zero-state re- 
sponse codebook for an input speech or audio vec- 10 
tor s n , transmit the zero-state response codebook 
index of the current best-match zero-state response 
vector as a compressed vector code of the current 
input vector, and also use said index of the current 
best-match zero-state response vector to select a 1 
vector from said permanent codebook for comput- 
ing said long term pitch predicted input vector s« to 
be subtracted from the next input vector s* of the 
frame. 20 

2. An improvement as defmed in claim 1, including a 
method for reconstructing said input speech or audio 
vectors from index coded vectors at a receiver, com- 
prised of decoding said side information transmitted for 
each frame of index coded vectors, using the indices 25 
received to address a permanent codebook identical to 
said permanent codebook in said transmitter to succes- 
sively obtain decoded vectors, scaling said decoded 
vectors by said quantized gain factor QG, and perform- 
ing synthesis filtering using said set of linear predictive 30 
coefficient parameters and pitch prediction filtering 
using said quantized pitch parameters QP and QPP to 
produce approximation vectors s n of the original signal 
vectors s n . 

3. An improvement as defined in claim 2 wherein said 35 
receiver includes postfiltering of said approximation 
vectors J„ by long-delay postfiltering and short-delay 
postfiltering in cascade, said quantized pitch and quan- 
tized pitch predictor parameters controlling said long- 
term postfiltering and said quantized linear predictive 40 
coefficient parameters controlling said short-term post- 
filtering, whereby adaptive postflltered digitally en- 
coded speech or audio vectors are provided. 

4. An improvement as defmed in claim 3 including 
automatic gain control of the adaptive postflltered digi- 45 
tally encoded speech or audio signal is provided by 
estimating the square root of the power of said postfll- 
tered speech or audio signal to obtain a value cr a (n) of 
said postflltered speech or audio signal and estimating 
the square root of the power of a postfiltering speech or 50 
audio signal input to obtain a value cri(n) of decoded 
input speech or audio vectors before postfiltering, and 
controlling the gain of the postflltered speech or audio 
output signal by a scaling factor that is a ratio of cn(n) 

to cr 2 (n). 55 

5. An improvement as defined in claim 4 wherein said 

quantized gain factor, quantized pitch and quantized 
pitch predictor parameters, and quantized linear predic- 
tive coefficient parameters are derived from said side 
information transmitted to said receiver. 60 

6 . An improvement as defmed in claim 3 wherein 
postfiltering is accomplished by using a transfer func- 
tion for said long-delay postfilter of the form 


1 + yz-P 1 - \/x 

8 1 _ \z~p 1 + y/x 
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where Cgis an adaptive scaling factor, p is the quantized 
value QP of the pitch parameter P, and the factors y and 
\ are determined according to the following formulas 


y — X — Cpf(x), 0 Cp<! 


where C z and C p are fixed scaling factors, 


1 if x > 1 
xifU th Zx±\ 

0 if* < U th 

U th is an unvoiced threshold value, and x is a voicing 
indicator parameter that is a function of coefficients bi, 
b 2 and b 3 , where bi, b 2 , b 3 are coefficients of said quan- 
tized pitch predictor QPP given by 
p 1 (z)=l~biz-"/ ,+ l — biz^P— b 3 Z~P“ l where z is the 
inverse of the input delay operator z~ l used in the z 
transform representation of transfer functions. 

7. An improvement as defined in claim 6 wherein 
postfiltering is accomplished by using a transfer func- 
tion for said short-delay postfilter of the form 


1 - Ptz/ff) 
1 - P(z/a) 


, 0 < < a < 1 


where a and /3 are bandwidth expansion coefficients. 

8 . An improvement as defined in claim 7 wherein 
postfiltering further includes in cascade first-order fil- 
tering with a transfer function 

1 — /SZ~ l f fl < 1 


where /i is a coefficient. 

9. A postfiltering method for enhancing digitally 
processed speech or audio signals comprising the steps 

of buffering said speech or audio signals into frames 
of vectors, each vector having K successive sam- 
ples, 

performing analysis of said buffered frames of speech 
or audio signals in predetermined blocks to com- 
pute linear predictive coefficients, pitch and pitch 
predictor parameters, and 

filtering each vector with long-delay and short-delay 
postfiltering in cascade, said long-delay postfilter- 
ing being controlled by said pitch and pitch predic- 
tor parameters and said short-delay postfiltering 
being controlled by said linear predictive coeffici- 
ent parameters, wherein postfiltering is accom- 
plished by using a transfer function for said short- 
delay postfilter of the form 


1 - pjzm 
1 - f\z/a) 


* 0 < P < a <-l 


where z is the inverse of the unit delay operator z _1 
used in the z transform representation of transfer func- 
tions, and a and are fixed scaling factors. 

10. A postfiltering method as defmed in claim 9 in- 
cluding automatic gain control of the postflltered digi- 
tally encoded speech or audio signal provided by esti- 
mating the square root of the power of said postflltered 
digitally encoded speech or audio signal to obtain a 
value 0 * 2 ( 11 ) of said postflltered speech signal and esti- 
mating the square root of the power of a postfiltering 
input speech or audio signal to obtain a value o*i(n) of 
decoded input speech or audio signal before postfilter- 
ing, and controlling the gain of the postflltered speech 
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or audio signal by a scaling factor that is a ratio of o*i(n) 
to o- 2 (n). 

11 . A postfiltering method as defined in claim 10 
wherein postfiltering is accomplished by using a trans- 5 
fer function for said long-delay postfilter of the form 

^ 1 + y z ~ p 

8 1 - Xz-P 10 

where C^is an adaptive scaling factor, p is the quantized 
value of the pitch parameter QP and the factors y and X 
are adaptive bandwidth expansion parameters deter- 15 
mined according to the following formulas 

y — X = Cpf{x), o< Cz> l 

20 

where C z and C p are fixed scaling factors and 


if * < Uth 


U f h is an unvoiced threshold value, and x is a voicing 
indicator that is a function of coefficients bi, b 2 , b 3 
where bj, b 2 , b 3 are coefficients of said quantized pitch 
predictor QPP given by 

P i(z) — 1 — biz p~^~ 1 — b 2 Z~~^ — b 3 Z P ^ where z is the 
inverse of the input delay operator z~ l used in the z 
transform representation of transfer functions. 

12 . A postfiltering method as defined in claim 11 
wherein postfiltering further includes in cascade first- 
order filtering with a transfer function 

1— fXZ” 1 , l 

where ju, is a coefficient. 

***** 
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